Learning from Partially Annotated Sequences
نویسندگان
چکیده
We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on monoand crosslingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.
منابع مشابه
Interval Insensitive Loss for Ordinal Classification
We address a problem of learning ordinal classifier from partially annotated examples. We introduce an interval-insensitive loss function to measure discrepancy between predictions of an ordinal classifier and a partial annotation provided in the form of intervals of admissible labels. The proposed interval-insensitive loss is an instance of loss functions previously used for learning of differ...
متن کاملAting It Points of View or Opinion, Stated Do Not Necessarily Repre
1-. . This document is a partially annotated bibliography of publications of the Learning Research and Development Center of the University of Pittsburgh. The sections of the bibliography are: Publications: 1972 (annotated); Publications: 1971 (annotated); Reprints: 1964-1970 (partially annotated); Working Papers: 1965-1970 (partially annotated); Te6hoical Reports: 1966-1970; Monographs: 1970 (...
متن کاملDiscriminative learning from partially annotated examples
A number of algorithms and its applications for automatic classifiers learning from examples is ever growing. Most of existing algorithms require a training set of completely annotated examples, which are often hard to obtain. In this thesis, we tackle the problem of learning from partially annotated examples, which means that each training input comes with a set of admissible labels only one o...
متن کاملStructured Learning from Partial Annotations
Structured learning is appropriate when predicting structured outputs such as trees, graphs, or sequences. Most prior work requires the training set to consist of complete trees, graphs or sequences. Specifying such detailed ground truth can be tedious or infeasible for large outputs. Our main contribution is a large margin formulation that makes structured learning from only partially annotate...
متن کاملSuperior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies
Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011